Changing the threshold value of the method selection #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As we know,
hist
method works much faster than its opponents on CPU. It works especially well with a large data size. At the moment, the threshold for choosing betweenhist
andexact
in the heuristic is too high (it is 2^22 or ~4M). We compared the performance and metrics forhist
andexact
on many workloads and came to the conclusion that it would be optimal to choose 2^18 (~250k) as the threshold. Below are brief tables with the best thresholds for different workloads.We chose the best threshold based on the training time and two testing metrics on each case. It was grid-searched as the power of 2, starting from 256. We used
accuracy
+log_loss
for classification andrmse
+r2
for regression. "Optimal threshold" means the minimum data size at whichhist
starts performing at least as well asexact
Before the start of the training, all the datasets were randomly shuffled. Next, the first N lines from training datasets were selected for training. Full testing datasets were used for testing. The procedure was repeated for
hist
andexact
.Classification task:
Regression task:
HW:
CPU: Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
Socket(s): 2
Core(s) per socket: 28
Thread(s) per core: 2
RAM: 24*16G
The full table with all numbers can be found here
This PR is a part of making
hist
method the default whentree_method==auto
(dmlc#7049)